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A  Simplified  R-action-Free  Integer  Gauss  Elimination  Algorithm 

Peter  R  Turner 

Mathematics  Department,  U  S  Naval  Academy,  Annapolis,  MD  21402 

Abstract  This  paper  presents  a  new  version  of  Gauss  elimination  for  integer 
arithmetic.  new  algorithm  allows  fraction-free  integer  computation  without 

requiring  any  fAlk  to  a  greatest  common  divisor  routine.  It  does  however  keep  the 
growth  in  the  integer  dynamic  range  to  a  minimum.  The  algorithm  is  b^d  on  a 
careful  comparison  of  the  divisionless  integer  GE  and  the  “normal”  algorithm  using 
divisions  wdthin  a  floating-point  or  real  arithmetic  setting.  FVom  this  analysis,  we 
identify  common  factors  which  are  necessarily  present  throughout  the  active  j^t 
of  the  matrix.  These  can  then  be  removed  by  exact  integer  division.  A  further 
consequence  of  this  analysis  is  that  the  diagonal  entries  of  the  final  upper  tnangulm: 
factor  are  precisely  the  determinants  of  the  principal  minors  of  the  origin^  matrix. 
In  a  parallel  processing  environment,  the  additional  cost  of  these  integer  divisions  is 
rr.ir^iTnWj^  since,  at  each  stage,  the  w^iole  active  array  is  being  divided  by  the  same 
integer. 


1.  Introduction 

In  some  form  Gauss  elimination,  or  GE  as  we  shall  often  abbreviate  it  here,  is  probably 
the  most  widely  used  single  computational  tool  in  scientific  computing  for  a  verj'  wide 
range  of  underlying  problems.  These  include  solution  of  partial  differential  equations, 
linear  programming  and  least  squares  approximation  in  its  various  guises  such  as  signal 
processing  and  linear  regression.  The  basic  problems  for  which  it  is  used  are  the  solution 
of  linear  systems  of  equations  (including  obtaining  the  solution  space  for  underdetermined 
systems),  computation  of  matrix  determinants,  determination  of  matrix  rank  or  detection 
of  singularity.  Gauss  elimination  is  usually  to  be  found  as  a  major  topic  m  ^ts  on 
linear  algebra  (where  the  emphasis  is  on  its  theoretical  basis),  numerical  analysis  (with  an 
emphasis  on  its  practical  implementation,  paying  attention  to  questions  of  roundoff  error 
and  numerical  stability),  parallel  and  vector  computing  (as  an  important  illustration  of  the 
potential  power  of  parallel  computers).  See  [l],  [3],  [5],  [10],  [11]  for  examples.  However 
few  of  these  discuss  in  any  detail  the  interplay  between  these  different  perspectives.  It 
is  precisely  this  interplay  which  leads  to  the  improved  integer  GE  algorithm  which  is  the 

subject  of  this  paper.  .  *  r  i  ♦ 

We  begin  with  a  brief  review  of  the  basic  GE  algorithm  and  some  variations  of  this. 

Section  2  also  contains  a  brief  review  of  the  main  applications  and  of  the  complexity 
analvsis.  Section  3  introduces  integer  GE.  The  requirements  of  integer  arithmetic  raises 
some  different  issues.  We  describe  first  the  division-free  form  of  the  algorithm  and  then 
the  questions  raised  by  this  -  most  notably,  the  growth  of  the  dynamic  range  that  is 
needed.  Some  of  these  issues  were  previously  discussed  for  the  specific  context  of  residue 

number  systems  in  [6],  [7],  [16].  .  ,  ,  .  . 

In  Section  4,  an  improved  fraction-free  integer  GE  algorithm  is  developed  from  a 
comparison  of  the  matrbc  entries  in  the  divisionl^s  algorithm  and  those  for  the  usual 
GE  algorithm.  A  short  discussion  of  the  use  of  greatest  common  divisors  to  reduce  the 
dynamic  range  growth  is  also  included  before  the  new  algorithm  is  presented.  Its  use 
in  the  various  standard  applications  is  also  described.  Section  -o  returns  to  the  qu^tion 
of  algorithm  complexity  and  includes  some  consideration  of  the  demands  of  (potentially) 
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long  integer  wordlength  arithmetic.  A  brief  comparison  with  the  use  of  GE  with  rational 
arithmetic  is  included.  The  paper  concludes  with  a  very  short  section  on  extensions  of 
the  algorithms  presented  here  to  other  algebraic  settings. 

2.  The  basic  algorithms 

2.1.  Problem  statement  and  notation.  Except  where  specifically  stated  we  shall 
consider  a  square  n  x  n  matrix  A  although  some  of  the  problems  and  solutions  are  sim¬ 
ilarly  valid  for  rectangular  matrices.  Elements  of  the  matrix  A  will  be  denot^  by  a,j. 
Its  determinant  is  denoted  det  A  and  its  rank  by  r(A).  The  three  basic  problei^  wj 
are  concerned  with  here  are  the  solution  of  systems  of  equations,  computing  det  A  an 
determining  r  (A).  For  definitions  of  any  of  these,  see  a  standard  text  such  as  [IJ. 

In  the  case  of  systems  of  equations,  we  use  the  notation 

Ax  =  b 


and  the  elements  of  the  unknown  and  right-hand  side  vectors  are  denoted  li,  i>i. 

The  underlying  principle  of  GE  is  that  multiples  of  the  first  equation  (or  row  of  the 
matrbc)  are  subtracted  from  all  subsequent  equations  (rows)  to  eliminate  the  first  unknown 
(element)  from  each  of  these.  The  process  is  then  repeated  to  eliminate  entries  below  the 
diagonal  of  each  column  in  turn.  The  system  of  equations  is  then  solved  by  substitution 
in  the  resulting  triangular  system.  Alternatively  the  detta-minant  of  the  original  matrix  is 
given  by  the  product  of  the  diagonal  entries  in  the  resulting  matrbc.  The  rank  is  given  by 
counting  the  number  of  nonzero  elements  on  the  diagonal  of  the  final  array.  In  the  case 
of  solving  a  system  of  equations,  whatever  operations  are  performed  on  the  matrix  must 
also  be  performed  on  the  right-hand  side. 

Of  course,  these  statements  are  oversimplifications  of  the  true  situation  but  they  con¬ 
tain  the  essence  of  the  approach.  In  the  rest  of  this  section,  we  present  the  general  version 
of  this  simple  GE  procedure  and  discuss  some  of  the  difficulties  that  can  arise. 


2.2.  The  ijk-form.  The  “ijk-form”  of  GE  is  just  the  general  n  x  n  version  of  the 
algorithm  outlined  above. 

Algorithm  1  Basic  GE  ijk  form 

Input  n  X  71  matrix  A  (and  right-hand  side  b  if  solving  a  system) 

Co77ipuie 

for  i  =  1  to  n  —  1 

for  J  =  7  -h  1  to  71 
771  .  —  0>j  i  /  I 
aji  :=  0 

:=  bj  -  7i\h,  (if  solving  a  system) 
for  =  7  +  1  to  71 

(^•jk  ^jk  lllO-ik 

Output  (modified)  matrix  A  (and  b  if  solving  a  system) 

Pivoting  is  the  name  given  to  altering  the  order  in  which  die  rows  are  used  in  the 
GE  algorithm.  The  simplest  cause  for  the  need  for  pivoting  is  the  occurrence  of  a  0 
in  the  appropriate  diagonal  position  so  that  Algorithm  1  breaks  down.  In  the  floating¬ 
point  enviroiinient  an  almost  equally  difficult  situation  is  created  by  a  very  small  diagonal 
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entry  which  mav  be  just  the  result  of  roundoff  error  in  a  quantity  which  would  be  0  if 
exact  arithmetic  were  being  used.  Such  a  pivot  element  can  result  in  large  errors  in  the 
computed  solution.  The  usual  pivoting  strategy  is  partial  ptvoUng  m  whidi  the  current 
i-th  column  is  searched  for  its  element  of  largest  magnitude  on  or  below  the  diagonal.  The 
row  in  which  this  occurs  is  then  interchanged  with  the  i-th  row  before  the  elimination 

''°“Sr''manv  purposes,  it  is  desirable  to  make  better  use  of  the  work  involved  in  GE. 
By  storing  the  multipliers  used,  we  obtain  the  LU  factorization  of  the  oripnal  inatrix  A. 
That  is  we  find  a  lower  triangular  factor  L  and  an  upper  triangular  one  L  such  that 

-4  =  LU 


or 


,  in  the  case  of  pivoting,  L,  U  are  factors  of  a  permuted  version  of  the  matrix  A: 

PA  =  LU 


For  this  form  of  the  algorithm,  the  factors  L  and  U  are  stored  in  the  same  locations  as  the 
original  matrix  A.  The  lower  factor  L  has  unit  diagonal  entries  and  so  these  need  not  be 
stored  explicitly.  The  principal  advantages  of  the  LU  factorization  lie  in  the  fact  that  if 
multiple  systems  are  to  be  solved  with  the  same  coefficient  matrix  then  the  factorization 
can  be  done  just  once  and  each  system  solved  using  forward  and  back  substitution  loops. 
This  is  then  much  cheaper  computationally  than,  for  example,  inverting  the  matrix. 

For  simplicity  in  the  descriptions  of  the  various  integer  algorithms  in  the  remainder 
of  this  paper,  we  shall  restrict  our  attention  to  the  basic  GE  algorithm.  The  inclusion  of 
pivoting  is  straightforward  —  though  it  is  not  obvious  what  constitute  a  good  pivoting 

strategy'  for  integer  GE.  . 

There  are  several  variations  on  the  basic  j  jfc-form  of  GE  which  have  merits  for  different 
computing  environments.  The  ijfc-form  in  Algorithm  1  is  well-suited  to  a  conventional 
serial  computer  architecture  with  matrices  stored  by  rows.  For  other  storage  schemes 
or  processor  architectures,  alternatives  may  be  preferred.  A  detailed  discussion  of  these 
aspects  is  included  in  [11].  These  \ariations  can  take  advantage  of  processors  which  are 
designed,  for  example,  for  efficient  performance  of  the  saxpy's  of  [5]  or  of  scalar  products. 

2.3.  Applications. 

Solving  linear  systems.  The  most  frequent  application  of  GE  and  LU  factorization 
is  to  the  solution  of  a  linear  system 

Ax  =  b 


Then  Algorithm  1  results  in  an  equi\-alent  upper  triangular  system 

t’x  =  b 


(1) 

(2) 


whose  solution  is  the  same  as  that  of  (1).  This  system  is  then  solved  using  back  substMu- 
Hon. 

Algorithm  2  Back  substitution 

Input  n  X  n  upper  triangular  matrix  U,  n- vector  b 

JiiiUolize  solution  x  :=  0 

Compute 
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for  i  =  n  downto  1 
for  j  =  f  +  1  to  n 

bi  I—  ”  UijXj 
Xj  1=  b{/lLix 

Output  solution  x 

Remark  1.  Note  that  b  iias  been  used  fin  place  ofh)  to  denote  the  right  hand  side  of 
(2)  in  this  algorithm. 

This  version  of  back  substitution  has  been  arranged  in  such  a  way  that  the  outer  loop 
results  in  obtaining  one  further  component  of  the  solution  each  time.  Again  there  are 
variations  which  are  better  suited  to  particular  architecture. 

The  computational  complexity  of  floating-point  algorithms  is  often  measured  by  the 
number  of  floating-point  arithmetic  operations  that  are  required.  Traditionally,  the  rela¬ 
tive  efficiency  of  algorithms  was  measured  by  counting  multiplications  and  divisions  and 
neglecting  addition  and  subtraction  operations.  For  modern  processors  the  time  required 
for  floating-point  multiplication  is  not  much  greater  than  that  for  addition  and  so  it  is 
sensible  to  consider  all  arithmetic  operations.  Division  still  costs  substantially  more  and 
any  elementary  function  evaluations  are  typically  yet  more  expensive.  The  floating-point 
operation  counts  for  GE  are  well-known.  They  can  be  found  in  almost  any  standard  text 
on  numerical  analysis  or  computational  linear  algebra  such  as  [3]  or  [5]. 


TABLE  1  Arithmetic  operation  counts  for  GE  solution  of  Ax  —  b. 


Operation 

Forward  elimination 

Back  substitution 

TOTAL 

H-  or  — 

X 

/ 

in  (n^  - 1) 

|n  n2  - 1) 

in(n-  1) 
in(n-  1) 
n 

in  (n—  1)  (2n  +  5) 
|n  (n  —  1)  (2n  +  5) 
|n(n  + 1) 

If  multiple  systems  with  the  same  coefficient  matrix  are  to  be  solved  then  the  overall 
operation  count  for  GE  is  obtained  by  multiplying  these  totals  by  the  number  of  systems. 
In  particular  the  inversion  of  the  matrbc  A  would  therefore  result  in  multiplying‘all  these 
totals  by  n  so  that  matrix  inversion  by  GE  is  an  Ofn'*)  operation.  The  corresponding 
table  of  operation  counts  for  LU  factorization  makes  it  easy  to  see  the  practical  advantage 
of  using  the  LU  factorization  whenever  multiple  systems  are  to  be  solved. 

Determinant  evaluation.  Whether  we  use  GE  or  LU,  the  evaluation  of  det  A  is 
extremely  simple  once  the  elimination  phase  is  complete,  (^^e  are  taking  no  account  of 
any  questions  of  error  analysis  or  stability  at  this  stage.)  In  the  absence  of  pivoting  the 
determinant  is  simply  the  product  of  the  diagonal  elements  of  U. 

Rank  and  singularity  detection.  Again,  neglecting  any  problems  created  by 
roundoff  errors  or  ill-conditioning  in  the  matrix,  once  GE  with  pivoting  has  been  com¬ 
pleted,  singularity  is  detected  simply  by  seeking  a  0  entry  in  a  pivot  position.  In  fact  it 
is  sufficient  to  examine  Onn  since,  if  any  pivot  element  is  zero,  then  necessarily  a„n  =  0. 
Again  note  we  are  assuming  both  pivoting  and  exact  arithmetic.  In  practice,  for  floating¬ 
point  arithmetic  at  least,  this  is  not  sufficient  and  more  care  must  be  taken  to  test  for 
near-singularity.  Extending  our  ideal  world  analysis,  by  counting  the  numbei  of  zeio 
pivots  we  obtain  the  rank-deficiency  of  the  matrix.  E<iuivalently  the  number  of  nonzero 
pivots  yields  the  rank  of  A.  This  is  a  much  more  optimistic  claim  than  even  the  singularity 
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statement.  Within  a  computer  algebra  system  or  with  exact  arithmetic  such  statements 

vslid*  ,  «  • 

In  the  domain  of  real  numerical  computation,  the  question  of  rank  determmation  is 

more  difficult.  The  effect  of  roundoff  error  on  GE  has  [of  ^ 

of  that  analysis  is  summarized  in  Section  2.5.  Packages  such  as  MAT  (  [  ], 

example)  include  rank  as  a  function  in  their  computational  library  and  u^  a  carefu  y 
computed  tolerance  dependent  on  parameters  of  the  computer  system  to  d^ermine  e 
“true”  rank  of  the  original  matrix  from  its  Singular  Value  Decomposition  (S\T))  [5].  Such 
aspects  are  not  the  focus  of  this  paper. 

The  solution  space  for  underdetermined  systems.  In  the  case  of  underdeter¬ 
mined  systems  of  equations,  whether  this  is  the  result  of  having  fewer  equations  than 
unknowns  or  of  rank  deficiency  in  the  matrbc,  a  set  of  vectors  spanning  the  solution  space 
is  easy  to  obtain  from  the  LU  factorization  of  the  matrix. 

The  backward  substitution  must  be  performed  the  same  number  of  times  as  the  rank 
deficiency.  Thus  if  the  n  x  n  matrix  has  rank  r,  we  solve  the  smaller  system  n  -  r  times 
each  time  with  r  of  the  unknowns  specified.  To  guarantee  linearly  independent  solutions, 
it  suffices  to  use  the  standard  basis  vectors  ei,  62, .  •  • ,  e„_r  in  turn  to  specify  the  va  ues  o 
That  is,  we  first  set  (x.^i,x.^2,. •  - ,Xn)  =  and  ^Ive  for 

the  remaining  unknowns.  The  process  is  repeated  for  (Xr^i ,  Xr-r2,  •  •  • ,  j  u, . . . ,  ; 

and  so  on  until  we  have  used  (xr+i,iCr+2)- •  •  t^n)  =  (9i- •  •  >95 1)- 

The  algorithms  are  simple  extensions  of  those  outlined  earlier.  Similar  basis  vectors 
could  be  used  in  more  abstract  settings  with  symbolic  computer  algebra  systems. 

3.  G.\uss  elimination  over  the  integers 
For  computation  over  the  integers,  it  is  necessary  to  make  changes  to  the  basic  GE  algo¬ 
rithms  described  in  Section  2.  The  most  obvious  cause  is  the  fact  that  the  integers  are  not 
closed  under  division.  This  has  the  side  effect  that  the  magnitudes  of  integers  generat^ 
during  the  computation  can  grow  rapidly.  The  range  of  integer  values  required  or  a%ailable 
is  known  as  the  dynamic  range.  Overflowing  the  integer  range  in  binary  integer  arithmetic 
often  results  in  the  phenomenon  known  as  “integer  wraparound”  (see  [3]  Chapter  1,  lor 
example)  which  is  the  effect  of  the  “clock”  arithmetic  modulo  2  where  A  is  the  integer 
wordlength  in  bits.  Further  complications  that  arise  out  of  integer  arithmetic  (however  it 
is  performed)  include  choosing  a  good  pivoting  strategy. 

In  a  Computer  Algebra  System  (CAS)  such  as  Maple^  [2]  or  Mathematica  [20],  some 
of  these  problems  are  avoided  at  the  expense  of  computational  speed  since  exact  arithmetic 
with  very  long  integers  can  be  performed  in  software.  However  such  computation  becomes 
very  slow  when  the  arithmetic  wordlength  gets  long  —  and  the  rate  of  growth  ol  t  le 
dynamic  range  can  be  very  rapid. 

■  One  wav  of  overcoming  some  of  the  difficulties  is  the  use  of  alternative  representation 
and  arithmetic  formats  for  the  integers.  The  residue  number  systems  RNS  are  well-suited 
to  some  of  these  tasks.  In  such  a  system,  an  integer  is  represented  by  its  residue  modulo  a 
number  of  different  prime  numbers.  The  advantage  here  is  that  the  gi  owth  of  the  dynamic 
range  is  achieved  by  extending  the  set  of  basis  primes  being  used.  RN'S  arithmetic  has 
a  natural  short  wordlength  parallelism  which  avoids  the  slowdown  caused  by  very  long 

^  MATLAB  is  a  registered  trademark  of  Tbe  MatbWorks,  luc 

2  Maple  is  a  j  egistered  trademark  of  Maple  Waterloo  Software,  luc. 

3 Mathematica  is  a  registered  trademark  of  Wolfiam  Research,  luc. 
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wordlength  arithmetic.  However  it  brings  with  it  other  difficulties  which  are  less  easily 
resolved.  Even  if  one  integer  divides  another  and  both  are  within  the  dynamic  range  o 
the  system  there  is  no  simple  division  algorithm  which  returns  this  result;  range  checking 
is  (at  best)  very  difficult;  comparison  is  not  an  RNS  operation.  RNS  arithmetic  systems 
and  processors  have  been  extensively  studied;  see  [13]  or  [15]  for  example. 

In  this  section,  we  discuss  the  implementation  and  use  of  GE  using  integer  arithmetic. 
Our  primary  focus  will  be  on  binary  integer  arithmetic.  The  use  of  RNS  ^ithmetic 
within  the  specific  context  of  adaptive  beamforming  was  discussed  m  [6],  [16].  The  latter 
reference  addresses  the  issues  of  dynamic  range  and  complexity  in  this  setting. 

3  1  Division-free  Gauss  eUmination.  The  simplest  way  of  modifying  GE  to  inte¬ 
ger  arithmetic  is  to  eliminate  the  divisions  by  performing  “cross-multiplications”  between 
the  rows  of  the  matrix.  This  is  the  form  which  generates  the  greatest  rate  of  growth  in 

the  dynamic  range.  This  corresponds  to  the  transformation  of  2  x  2  matrices 

This  is  achieved  by  multiplying  the  second 


b 

d 


to 


instead  of 


d  -  b{c/a)  J 


a  0 
0  ad  — be  ^ 

row  by  a,  and  subtracting -  ^  •  r 

rithm  consists  of  repeating  this  for  all  the  appropriate  submatrices  as  in  the  basic  form 

of  the  algorithm  in  Section  2. 


From  it  c  times  the  first  one.  The  overall  divisionless  GE  algo- 


Algorithm  3  Division-free  GE  ijk  form 

Jnput  n  X  n  integer  matrix  A  (and  right-hand  side  b  if  solving  a  system) 

Compute 

for  7  =  1  to  71  —  1 
for  j  =  7  4- 1  to  u 

bj  :=  aubj  -  Ujibi  (if  solving  a  system) 
for  /c  =  7  -h  1  to  n 

0>jk  *“  ""  (^ji^ik 

Oji  :=  0 

Output  (modified)  matrix  A  (and  b  if  solving  a  system) 

Depending  on  the  individual  task  being  performed,  this  elimination  algorithm  w^ould 
then  be  followed  with  the  appropriate  final  stages  —  a  modified  back  substitution  for 
solving  a  system  or  other  modifications  of  the  algorithms  of  Section  2  for  rank  determi¬ 
nation  or  determinant  calculation.  These  are  discussed  in  Section  4  in  the  context  of  the 
new'  version  of  the  algorithm. 

It  is  easy  to  see  from  the  2x2  situation  that  the  matrix  elements  are  likely  to  grow 
rapidly  during  this  process.  By  way  of  contrast  Wilkinson  [19]  establishes  that  there  is 
no  growth  in  the  dynamic  range  if  partial  pivoting  is  used  with  Algorithm  1. 

3,2.  Growth  in  dynamic  range.  The  question  of  the  range  giowth  in  divisionl^ 
GE  w'as  addressed  in  some  detail  in  [6],  [7],  [16]  in  order  to  analyse  the  possibilities  for 
R.NS  arithmetic  within  the  context  of  adaptive  beamforming.  In  this  section  we  begin  by 
summarizing  these  results  for  general  integer  arithmetic.  In  order  to  develop  satisfactory 
algorithms  for  the  various  underlying  problems,  it  is  also  useful  to  examine  the  relatmn 
between  the  matrix  entries  arising  from  the  divisionless  algorithm  and  the  corresponding 
algorithm  using  division.  This  question  Is  addressed  later  in  this  subsection. 
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To  get  a  feel  for  the  potential  range  gro«^h  in  the  divisionless  GE  algorithm,  consider 
fir  a  2  X  2  matrix  with  integer  entries  in  the  range  [-M,  A/].  Lsual  binary  integer 

hav.  .  of  th.  form  \-M,M  -  1);  b»t  for  the 
reLrk  that  the  initial  range  does  not  necessarily  match  the  available  d>namic  range.  T 
symmetric  range  simplifies  the  analysis  of  the  gro'^'^h. 


The  transformation  of  the  2x2  matrix 


to 


b 

ad  —  be 


results  in  an 


T  hr*  £1  M^l  If  M  =  2^  -1,  this  implies  that  the  dynamic  range 

element  dd  be  €  2iW  »  J*  .  -  j.  .  Kite  to  ■4'  2 

reauired  has  increased  from  a  required  minimum  wordlength  of  A  -H  i  bits  to  2A  -H  ^ 

bite.  This  same  exponential  rate  of  groirth  is  possible  at  every  stage  of  the  outer  loop  of 

^'^Th1^"the  initial  required  wordlength  is  (approximately)  doubled  N- 

the  divisionless  GE  elimination  for  an  N  x  N  matrix.  The  final  wordlength  needed  is 

therefore  around  2^'"^  {K  +  1).  It  is  easy  to  see  that  this  would  very  quickly  exhaust  any 

normalh'  available  dynamic  range.  •  j  4.  •  ♦  r  7l 

For  example,  if  K  were  just  3  so  that  the  initial  integer  range  is  restricted  to  just  r ,  7J 

with  N  =  6,  the  final  dynamic  range  would  need  a  wordlength  of  at  least  2x4-  12 
bite  which  is  already  double  the  length  of  any  commonly  found  built-in  integer  forrna_ 
Clearly  for  a  more  realistic  range  for  the  initial  data  and  larger  matrix  dimensions  this 
grovi'th  would  very  quickly  exceed  all  plausible  ranges  even  for  software  implementation. 

In  the  case  of  complex  integer  arithmetic  as  in  [7]  the  growth  is  even  slightly  more 
rapid  than  the  above  analysis  suggests  and,  perhaps  more  importantly,  the  worst  case 
growth  is  achieved  in  realistic  examples.  Clearly  it  becom^  necessary  to  find  of 

^ricting  this  groi^ah.  The  standard  approach  is  to  search  for  greatest  common  divisors 
and  to  factor  them  out  of  subsequent  computation. 

3.3.  Reducing  the  range  growth.  In  the  next  section,  we  consider  how  to  teke 
full  adx^ntage  of  common  factors  which  result  from  the  division's  algorithm.  In  this 
section  we  look  first  at  the  simplest  range  reduction  technique  resulting  from  the  lemoval 
of  unnecessary  factors. 

Greatest  common  divisors.  The  simplest  approach  is  just  to  divide  out  any  com¬ 
mon  factors  in  the  “cross-multiplication”  operations  which  are  at  the  heart  ofjh® 
sionless  GE  algorithm.  Again,  this  idea  is  easily  understood  by  considering  le 

elimination  beginning  with  .4  =  [  ^  ^  •  Suppose  that  a,c  have  a  common  factor  p 

so  that  there  exist  hitegers  a,  7  such  that  a  =  pa,  c  =  p  i-  Then  the  usual  dhisionless 

^  ^  can  be  replaced  by  subtracting  7  times  the  first  row 


transformation  to 


ad  ^  be 

from  a  times  the  second  to  yield 


since  ac  —  07  =  ap7  —  poc^  —  0. 


a  b 

I  0  od  “  ^7  j  j  /  \  f 

This  suggests  that  it  is  desirable  to  find  the  greatest  common  divisor  gcd{a,c),  oi  a  c 
before  proceeding  with  the  elimination.  This  same  principle  can  be  applied  at  each  step 
of  the  process  and  it  can  be  applied  in  more  general  setting  than  just  integer  anthme  ic 
For  exLple,  this  idea  is  incorporated  into  the  polynomial  fraction-free  Gauss  elimination 

^”"Furthel°IimScation  may  be  achievable  using  the  ged  in  the 

that  a  complete  row  of  the  matrix  at  some  stage  has  a  nontrivial  common  factor.  This  too 
can  be  divided  out,  reducing  the  range  giowth  in  subsequent  stages  of  the  elimination. 
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Of  course,  if  the  goal  is  to  obtain  deti4  then  a  record  of  (the  product  of)  th«e  factors 
must  be  kept  to  multiply  the  final  result.  Finding  the  gcd  of  a  complete  row  of  a 
of  annhing  more  than  very  small  dimension  may  be  at  a  price  which  does  not  justifj'  the 
savings.  The  simplest  technique  for  finding  gcd(a,c)  is  the  Euclidean  algorithm  which  is 
easily  implemented  as  follows: 

Algorithm  4  Euclidean  algorithm  for  integer  gcd 

Input  Positive  integers  c<  a 
Initialize  q  :=  c,  tmpc  :=  o 
Repeat 

tmpa  :=  tmpC,  tmpC  :=  q 
q  :=  tmpa  mod  tmpC 
until  q  ~0 

Output  gcd(a,  c)  is  tmpc 

This  can  be  used  within  the  following  modified  divisionless  GE  algorithm. 

Fraction-firee  algorithm.  A  fraction-free  version  of  GE  can  be  defined  using  the 
gcd  algorithm  above  to  reduce  the  range  growth  effect.  The  resulting  algorithm  needs  no 
divisions  beyond  the  removal  of  known  factors. 

Algorithm  5  Fraction-free  GE  using  greatest  common  divisors  ijk  form 

Input  nx  n  integer  matrix  A  (and  right-hand  side  b  if  solving  a  system) 
Compute 

for  i  =  1  to  n  —  1 
for  j  =  i  -t- 1  to  n 
p  :=  gcd(a«,Oji) 
a  :=  Oii/p-,  7  :=  ^ji/p 
bj  .—  Qbj  -  'rbi  (if  solving  a  system) 
for  fe  =  i  -t- 1  to  n 

Cjk  ■=  OOjk  —  "tCik 

aji  :=  0 

Output  (modified)  matrbc  A  (and  b  if  solving  a  system) 

Since  we  have  no  advance  knowledge  of  the  existence  of  nontrivial  factors  (that  is 
p  >  1)  this  algorithm  does  not  obviously  moderate  the  worst  case  range  growth  of  the 
above  analysis.  In  fact  the  comparative  analysis  of  the  next  section  suggests  that  it  would 
have  some'beneficial  effect  —  but  at  considerably  greater  cost  than  is  necessary. 

Using  the  gcd  of  complete  rows  can  be  incorporated  into  the  algorithm  at  any  stage. 
Potentially  such  factors  could  exist  in  any  row  of  the  active  matrix  and  such  factors  could 
be  sought  in  every  row  at  every  stage  of  the  elimination.  The  algorithmic  changes  are 
straightforward  but  the  cost  is  likely  to  be  too  high  and  so  we  do  not  elucidate  further 
here  —  except  for  one  very  important  special  situation  which  does  arise  as  a  result  of  the 
elimination  process  itself. 


4.  The  new  algorithm 

The  development  of  the  new  fraction-free  integer  GE  algorithm  is  based  on  a  comparison 
between  the  matrix  entries  which  result  with  or  without  divisions.  The  most  impo^ant 
result  of  this  analysis  is  that  certain  common  factors  are  found  to  be  generated  by  the 
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1  ^  M-nv  These  can  be  removed  using  exact  integer 

«, » .eve,  whiC  ..y  be  coesider«. 

in  the  problem. 

1.  1  j  ciaiiss  elimination.  In  this  subsection, 

of  GE  under  consideration.  For  simplicity  here  we  shall 

and  shall  assume  that  the  algorithm  does  not  br^k  down  due  ^  f 

:?".e  -  —■>  *»' ‘  r“.be  i'rsE"“ 

below  the  diagonal  in  the  i-th  column.  The  entr.es  resulting  from  the  i-th  ^ith  dm 
sions  are  denoted  by  a}g.  The  corresponding  entries  for  the  divisionless  form  Algont 
will  be  denoted  by  6^2  U-  ^  >  ’)•  algorithms  are  therefore 

given  by  (j  =  0, 1, ...,  A  -  1:  fc  >  j)  where  4,  =  6,,  =  au- 

To  see  the  relations  between  the  entries  and  “S  ’  transfo.mat.on 

in  the  current  notation, 

.rn  ,  _  r  j 


=  ad-bc=  a\d-  b(c/a)]  =  aii42 


and  that  the  conesponding  relation  would  hold  for  all  other  matrix  entries  resulting  from 
the  first  stage.  That  is  I'W 

6W  =  a„a>y  (2<J,fc<.V) 

I„  narticular  we  see  th.t  is  tl.e  determinant  of  tlte  tol>left  2  x  2  submatrix  It  will 
I:  Cim  to  denote  thifby  In  general,  we  shall  d.„0«  b,  d,  tl.e  deter, n.nat.t  of 

‘^'^!tt  :::g:'„Ttt  elimilion.  is  «entia,ly  tl.e  same  as  this  hrst  s^ge  o^^g 
on  rb:u.m.ri*g,.t  (,V  - 1)  x  ,.V-  „  senate  sub...at,ix  -  the  ■aetiv.  .name  .  U  Mow 
that  there  is  a  further  scaling  of  all  affected  entries  b>-  the  pivot  element  b,,  .  Thus. 

;,^y  =  a„4yajy  {3<j.k<N)  (4) 

and  continuing  in  this  way  we  obtain  the  general  relation: 


/)^2  = 


(i  <  j-k  <  eV ) 


(5) 
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E^rh  “final”  divisionless  entry  is  its  corresponding  value  from  Algorithm  1  scaled  by 
the  product  of  all  the  final  diagonal  entries  of  the  divisionless  algorithm  above  it. 

We  note  immediately  that  this  analysis  suggests  that  entri^  in  the  active  matrix  have 
common  factors  which  have  been  included  by  the  algorithm  itself.  These  represent  one 
obvious  way  of  reducing  the  range  growah  and  altering  the  algorithm.  This  modification, 
which  is  presented  below,  will  result  in  each  diagonal  element  being  the  determinant  di 
of  the  principal  minor  of  the  appropriate  dimension. 

4.2.  A  new  fraction-free  GE  algorithm.  From  (4)  it  follows,  in  particular,  that 

=  ail4y®33^  ~  “11  (ailo4y)  ~  011^^3 

where,  we  recall,  da  =  an42  0^3^  is  the  determinant  of  the  principal  3x3  minor.  We 
remark  that  da  is  an  integer.  It  follows  that  has  the  factor  an.  Using  this  same 
reasoning,  we  see  that  has  a  factor  an  for  ever}’  j,k>  3  since  each  such  element  is 
an  times  the  (integer)  determinant  of  a  3  x  3  minor.  This  known  factor  can  easily  be 
removed  prior  to  the  next  stage  of  the  elimination. 

Similar  factors  are  introduced  into  the  active  matrbc  at  each  subsequent  stage.  These 
too  can  be  divided  out.  The  factors  introduced  at  the  subsequent  stages  of  the  elimination 
are  the  (modified)  diagonal  entries.  Since  these  are  known  factors,  there  is  no  need  for 

any  calls  to  a  gcd  function.  ,  •  v  4. 

The  removal  of  these  factors  has  an  obvious  effect  on  the  range  grovii:h  m  subsequent 

stages  of  the  elimination  even  though  its  effect  on  the  worst  case  analysis  is  unclear.  The 
growth  in  the  required  wordlength  could  be  computed  “on-the-fly  ’  in  order  to  take  full 
advantage  of  this  saving. 

The  division  by  these  factors  is  incorporated  into  Algorithm  6  below.  We  oteerve  that 
this  algorithm,  like  Algorithm  5,  is  fraction-free  but  not  division-free.  All  divisions  that 
are  performed  are  integer  operations  with  exact  integer  results. 


Algorithm  6  Fraction- free  GE  ijk  form 

Input  nxn  integer  matrix  A  (and  right-hand  side  b  if  solving  a  system) 

Compute 

for  7  =  1  to  7?  —  1 
for  j  =  7  +  1  to  7? 

bj  :=  Qiibj  -  a^ibi  (if  solving  a  system) 
for  =  7  +  1  to  7? 

ajk 
aji  I—  0 

if  7.  >  2  then  (renio\-al  of  common  factors) 

for  j  =  i  -h  1  to  7? 

bj  :=  bj/oi- 1 1  (if  solving  a  system) 
for  /c  =  7  -f  1  to  n 
ajk  := 

Output  (modified)  matrix  -4  (and  b  if  solving  a  system) 

To  gain  some  insight  into  the  saving  that  results  from  this  algorithm,  consider  just  the 
final  value  of  o,v.v.  In  the  division-free  Algorithm  3,  using  (5)  for  i  =  -  1,  we  have: 

"S.\’  <ill'’22  ■  ■  ■  “.V-  1,.V-  l“.VA' 
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V-3 


v-3)  \ 

-2, A'- 2  f 


(6) 


(7) 


which  in  turn  yields 

The  corresponding  final  \;alue  for  Algorithm  6  is  just 

o.v,v  =  det  A  =  dN 

which  is  typically  very  much  smaller  since  these  additional  factors  have  been  remov^ 
during  the  modified  elimination.  Indeed  it  turns  out  for  this  Algorithm  6  that  at  the 
conclusion  of  the  elimination  we  have 

an  =  di-  (®) 

Example  1.  Comparison  of  Algorithms  3  and  6  for  a  4  x  4  matrix.  The  results  of  exact 
arithmetic  in  Algorithm  1  are  also  included  for  comparison  and  illustration. 

Let 


A  = 


8  7  4  1 
4  6  7  3 
6  3  4  6 
4  5  8  2 


Algorithm  1  yields  the  following  sequence  of  modified  matrices: 


■  8  7 

4 

1 

■  8 

7 

4 

1 

■  8 

7 

4  1 

0  5/2 

5 

5/2 

0 

5/2 

5 

5/2 

0 

5/2 

5  5/2 

0  -9/4 

1 

21/4 

0 

0 

11/2 

15/2 

0 

0 

11/2  15/2 

0  3/2 

6 

3/2 

0 

0 

3 

0 

0 

0 

0  —45/11 

from  which  we  may  deduce  that  det  A  —  (8)(5/2)(ll/2)(  45/11)  450. 

The  division-free  Algorithm  3  gives: 


8-7  41 

0  20  40  20 

0  -18  8  42 

0  12  48  12 


8  7  4  1 

.0  20  40  20 

0  0  880  1200 

0  0  480  0 


8  7  4 

0  20  40 

0  0  880 

0  0  0 


1 

20 

1200 

-576000 


which  shows  the  very  rapid  growth  which  is  possible  with  this  algorithm. 
The  fraction-free  Algorithm  6  also  gives 

8  7  4  1 

0  20  40  20 

0  -18  8  42 

0  12  48  12 


8  7  4  1 

0  20  40  20 

0  0  880  1200 

0  0  480  0 


but  the  active  part  of  this  matrix  is  then  divided  by  the  common  factor  8  to  yield 


I 

20 

0 

0 


4 

40 

no 

60 


1 

20 

150 

0 
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which  in  turn  gives 


1 

20 

150 

-9000 


The  active  matrix  is  now  just  the  bottom-right  element  which  needs  to  be  divided  by  the 
“common  factor”  20  to  yield; 

■  8  7  4  1  ■ 

0  20  40  20 

0  0  no  150 

0  0  0  -450 

The  final  values  of  the  diagonal  entries  can  easily  be  seen 

products  of  the  diagonal  of  the  final  upper  triangle  generated  by  "t^m  1) 
determinants  of  the  appropriate  principal  minors,  as  predicted  by  (  0  an  (  ). 

Although  there  has  been  some  growth  in  the  magnitudes  of  the  matrix  elemente  it 
has  been  kept  in  much  better  clioik.  This  growth  is  no  greater  than  that 
accommodated  if  the  determinant  of  the  original  matrix  is  to  be  computed.  The  i^  hole 
of  this  computation  could  have  been  achieved  using  standard  16-bit  integer  arithmetic  - 
even  including  the  temporary  values.  The  division-free  algorithm  in  this^e  r^uires  at 
least  20  bits  even  though  the  original  matrix  has  integer  elements  bounded  by  ^ 

The  benefits  in  terms  of  range  growth  derived  from  using  Algorithm  6  in  the  above 
example  would  be  essentially  matched  by  using  Algorithm  o  in  tliK  case  but  the  la  ter 
algorithm  would  abo  require  6  gcd  operations.  For  Algorithm  6,  the  factors  to  be  removed 
are  known  and  automatically  stored  as  part  of  the  matrix.  The  sequence  of  matrices 
generated  by  Algorithm  5  for  this  example  is 


8  7  4  1 

0  5  10  5 

0  -9  4  21 

0  3  12  3 


8  7  4  1 

0  5  10  5 

0  0  110  150 

0  0  30  0 


8  7  4  1 

0  5  10  5 

0  0  110  150 

000  -450 


We  should  observe  however  that  it  is  not  generally  the  case  that  Algorithm  5  yields 
ov  V  =  d/vr  nor  that  it  restricts  the  growth  in  the  dynamic  range  as  effectively  as  this. 
Note  that  the  original  matrix  has  a  common  factor  of  2  throughout  its  first  column. 

Remo^^l  of  common  factors  from  rows  of  the  active  matrix  for  the  above  example 

would  yield 


8  7  4  1 

0  12  1 
0  -9  4  21 

0  14  1 


8  7  4  1 

0  12  1 
0  0  11  15 

0  0  10 


8  7  4  1 

0  12  1 

0  0  11  15 

0  0  0  -15 


which  clearlv  has  a  greatly  beneficial  effect  on  the  dynamic  range  -  but  at  a  consideiab le 
cost  There  are  22  gcd  operations  required  in  order  to  achieve  this  sa\  mg.  If  the  det  A 
required  it  is  not  a  simple  matter  to  recover  it  from  the  resulting  upper  tnangii  ar  factor 
even  with  a  knowledge  of  the  factors  which  have  been  removed.  If  a  linear  s>stem  ueie 
to  be  solved,  the  number  of  gcd  operations  incretises  yet  further  with  no  guaiantee  of 
retaining  the  savings  seen  here. 
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It  it  infonMtive  to  consider  .  second  example  «.hich  is  jnst  a  column  permutation  of 
this  first  one. 

Example  2.  We  repeat  much  of  Example  1  for  the  matrix 

'7418' 

6  7  3  4 
3466 
5  8  2  4 


which  is  permuted  so  that  there  is  no  coDvenieot  common  factor  in  the  first  column 
The  division-free  Algorithm  3  gives; 

7  4  1  8 

0  25  15  -20 

0  0  735  770 

000  551250 


■741  8 

■74  1  8  ■ 

0  25  15  -20 

0  16  39  18 

0  36  9  -12 

0  25  15  -20 

0  0  735  770 

0  0  -315  420 

which  again  shows  the  expected  very  rapid  growth. 

The  corresponding  sequence  of  matrices  generated  by  the  fraction' 


free  Algorithm 


6  is 


'741  8 

0  25  15  —20 
0  16  39  18 

0  36  9  -12 


7 

4 

1 

8 

0 

25 

15 

-20 

0 

0 

105 

110 

0 

0 

—45 

60 

7  4  1  8 

0  25  15  -20 

0  0  105  no 

0  0  0  450 


in  which  we  see 
matrix  but  that 


not  only  that  the  final  value  of  044  is  the  determinant  of  the  ori^nal 
this  matrix  is  positive  definite  since  all  its  principal  minors  have  positive 


determinants. 
In  this  case 


using  Algorithm  5  so  that  common  factors  are  removed  form  the  multiph 


ers,  we  get 


7 

4 

1 

8 

0 

25 

15 

-20 

0 

16 

39 

18 

0 

36 

9 

-12 

7 

4 

1 

8 

0 

25 

15 

-20 

0 

0 

735 

770 

0 

0 

-315 

420 

7  4  1  8 

0  25  15  -20 

0  0  735  770 

0  0  0  15750 


by  “removal-’  of  the  common  factor  35  from  735  and  -315  in  the  multipliers 

list  step.  Clearly  this  time  the  range  reduction  achieved  by  Algorithm  0  is  much  infer 

«  r  .>  _ _  .  1  ,.^4- K 


to  that  of  the  new  algorithm 
■Rf'inovai  of  common  factor 


‘741  8 

0  5  3  —4 

0  16  39  18 

0  12  3  -4  _ 


7  4  18 

0  5  3  -4 

0  0  21  22 
0  0-34 


7 

4 

1 

8 

0 

5 

3 

-4 

0 

0 

21 

22 

0 

0 

0 

150 

The  beneficial  effect  on  the  dynamic  range  is  again  apparent  but  the  disadvantages  noted 


after  Example  1  remain. 
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4  3  Applications,  to  this  section,  describe  the  effect  of  the  t^.™-frce 

GE  Alsorithm  6  on  the  solntion  of  the  rnrious  onderlying  problems.  From  the  observat.o|» 
to  theLt  section,  it  Is  plain  that  the  eratotion  of  det  A  is  partrcnlarly  simple,  provided 
only  that  no  zero  pivots  arise  during  the  computation. 

Linear  systems.  For  the  solution  of  a  (nonsingular)  linear  system  using  Algorithm 
6  there  is  of  course  no  guarantee  that  the  solution  vector  has  only  integer  components. 
If  the  svstem  is  known  (because  of  the  context,  perhaps)  to 
then  this  can  be  computed  by  applying  the  conventional  back 

or  Algorithm  8,  for  example)  in  which  case  the  divisions  will  again  have  _ 

Otherwise  the  solution  can  be  expressed  as  fractions  using  rational  arithmetic 
obvious  modifications  of  Algorithm  2. 

Example  3.  Consider  the  coefficient  matrix  of  Example  1  udtii  the  original  right-band 
side  vector  [45, 30, 40, 30]^. 

Algorithm  6  would  generate  the  augmented  matrix 


■  8 

7 

4 

1 

45 

0 

20 

40 

20 

60 

0 

0 

110 

150 

260 

0 

0 

0 

-450 

-450 

from  which  we  obtain  the  integer  solutions  as  X4  =  (-450)/ (-450)  -  L  so  that  X3  [260 
150(1)1/110  =  1.  Then  X2  =  [60  -  40(1)  -  20(1)]  /20  =  0  and  Xj  =  (4o  -  4  -  l)/8  -  5 

If  we  did  not  know  in  ad^’ance  that  the  original  system  has  an  integer  solution  vector, 
rational  arithmetic  would  be  used  to  generate  the  equivalent  results. 

Determinant.  We  have  already  seen  that  Algorithm  6  delivers  det  A  automatically 
as  the  final  value  of  Furthermore,  since  the  final  diagonal  consists  of  the  determi¬ 

nants  of  the  principal  minors  of  increasing  dimension,  positive  or  negative  (semi-)  de  - 
niteness  can  also  be  detected  simply  and  automatical!}. 

Rank  determination.  In  the  absence  of  a  zero  pivot,  again  there  is  nothing  further 
to  be  done.  In  the  event  of  such  a  zero,  then  interchanging  the  pivot  row  with  any  lower 
row  with  a  nonzero  entry  in  the  pivot  column  will  allow  the  fraction-free  algorithm  o 
proceed.  Simply  counting  the  number  of  nonzero  entries  on  the  diagonal  gives  the  rank 
as  before.  This  simple  interchange  is  not  necessarily  the  optimal  pivoting  strategv 

integer  computation.  ,  ^  j  j 

In  the  event  of  a  rank-deficient  matrix,  obtaining  the  solution  space  for  an  underde¬ 
termined  system  can  be  accomplished  by  modifying  the  back  substitution  algorithm  in 
just  the  same  way  as  for  real-number  computation. 

Pivoting.  Clearlv  in  t  he  event  of  a  zero  pivot  some  pivoting  is  necessary  in  any  good 
implementation  of  GE.  The  question  is  what  is  the  right  strategy  for  integer  computing. 
Choosing  the  largest  element  in  the  floating-point  algorithm  has  the  virtue  of  keqiing  all 
multipliers  small  and  therefore  restricting  the  growth  of  the  matrLX  elements.  However 
that  restricted  growth  is  only  realized  because  of  the  divisions.  .  . 

At  least  intuitivelv.  choosing  the  largest  element  in  the  pivot  column  is  not  nec^anly 
good  for  integer  computation.  Indeed  a  large  prime  pivot  is  probably 
that  appears  to  almost  guarantee  rapid  growth  in  the  dynamic  range.  In  (4)  and  then  (o) 
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v™  ^  that  each  bg'  hae  the  factor  o„  and  e»:h  farther  etage  has  this  factor  and  then 

has  tt  repeated  in  the  factors  lit'’-  The  earlier  a  particul» 

impact  it  has  on  subsequent  range  gro>vth.  This  is  made  plain  in  equation  (6). 

Tn  the  fraction-free  version  Algorithm  6,  however,  the  range  grovah  ^ 

dependent  of  the  lal^t'o^^e  principal  minor  determi- 

MntsTnd^  thi  vi  is  invariant  (up  to  sign  changes)  under  rw’  f 

S  trsimplest.  pivoting  strategy  is  probably  the  best  for  this  algorithm.  That  is  at 
stage  i,  we  should^use  the  first  row  for  which  the  pivot  column  entrj-  is  nonzero,  choose 

’"“ft  Z  f.»"crilwe  ttafa^dWerent  ordering  may  represent  some  nrinor  »” 

this  For  example  looking  for  pivots  which  are  either  powers  of  the  binary  (or  other  com 
pntationnl)  base  may  make  subsequent  dbnsions  particularly  simple 

for  the  appmpriate  pivot  element  in  this  regmd  would  (almost  surely)  be  more  wasteful 
than  beneficial. 

5.  Complexity  of  the  fraction-free  algorithm 
In  this  section  we  begin  by  obtaining  tiie  basic  integer  operation  counts  for  the 
free  GE  Algorithm  6.  There  are  two  essential  differences  between  these  counts  and  those 
for  the  floating-point  algorithms:  the  fundamental  elimination  loops  contain  no  divisions 
but  have  twice  as  many  multiplications,  and  there  is  the  additional 
removal  of  the  common  factors  after  the  first  two  stages  so  that  divisions  occur 

^^'^The 'additional  multiplications  in  the  elimination  aie  easily  counted.  Tl^  common 
factir  removal  entails  only  divisions.  The  total  number  of  these  can  be  assessed  from  the 
loop  control  limits.  For  each  i  >  2,  there  is  one  division  in  the  j-loop  .  , 

are  solving  a  svstem)  which  runs  from  j  =  i  +  lton.  Also  in  this  loop  is  the  A-loop  v  Inch 
has  the  same  limits  and  contains  another  division.  The  total  division  count  is  theiefoie 

V(n  -  r)(n  -  i  +  1)  =  f^l(l  +  1)  =  -  l)(n  -  2) 

'=> 

The  total  operation  counts  are  summarized  in  Table  4. 

TABLE  4  Integer  arithmetic  operation  counts  for  the  fraction-free  GE  Algorithm  6  for 
solution  of  an  n  X  n  linear  system  Ax  =  b. 


Oneration  Matrbc  Factorization  Right-hand  side  TOTAL 

/n'i  ZT) -  ^n(n-I)  -  1)  {2«  +  5 

n(n-l)  n(n-l)(2n  +  5) 

|(n-2)(n- l)(2n-3]  ^(n-l)(7?-2)  ^n(n  -l)(n  2) 


or  — 


/ 


Note  that  this  operation  count  does  not  include  the  back  substitution.  \Miat  we  s^ 
most  importantly  is  that  the  number  of  multiplications  has  doubly 
number  of  divisions  has  increased  by  a  complete  order  from  0{n  )  to  0(n  )• 

Unfortunately  this  is  not  the  end  of  the  story.  The  complexity  of  ^  ^ 

further  increased  bv  virtue  of  the  fact  that  these  integer  divisions  are  tv  pica  Iv  moie 
difficult  than  their  floating-point  counterparts  -  especially  with  tlie  ^o>^  th  which 

w^  have  alreadv  seen  can  be  substantial.  This  is  likely  to  necessitate  the  use  of  Ion, 
integer  arithmetic  for  which  sjiecial  algorithms  must  be  used. 
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5.1.  Long  integer  arithmetic.  Long  integer  arithmetic  can  be  simulated  using  mul- 

tiDle  words  of  some  basic  wordlength,  ,  \  or, 

^  For  example,  if  the  underlying  integer  wordlength  is  8  bits  (or  1  byte)  then 
integer  can  b^regarded  as  a  radix  2®  =  256  digit.  Conventionally  the  range  of  values  of  the 
base  256  digits  would  be  -128,  -127, . . . ,  127  so  that  signed  integers  can  be  represent^ 
S^implicitv  in  the  current  description,  we  restrict  our  attention  to  nonnegative  mt  jem 
with  a  digit  range  of  0, 1, ,  255.  Very  large  integers  could  then  be  stored  using  a  Nector 

of  such  integers  using  conventional  place  value.  a-  ^  v  n  _  oL 

In  general,  suppose  the  base  wordlength.is  L  bits  so  that  the  effective  radK  R  2 
The  vector  (do ,  di ,  •  • . ,  dft  - 1 )  would  then  be  used  to  represent  the  integer  N  £[  ,  j 


given  b\' 


N  =  +  dK-2R^~^  +  •  •  ■  +  diR  +  do 


(9) 


whei-e  each  digit  satisfies  0  <  d,  <  R.  For  efficient  arithmetic  using  such  a  representa¬ 
tion  we  require  an  integer  accumulator  with  2L  bits.  Among  other  cons^uences  of  this 
are  that  Sdition  can  be  computed  in  “digit-parallel”  with  subsequent  attention  to  any 
carries  which  may  be  propagated.  A  large  radix  carry  lookahead  or  randitional  sum  adder 
could  be  constructed  to  improve  the  efficiency  of  addition.  These  addition  ajgonth^  are 
simple  generalizations  of  the  usual  binary  addition  algorithms  which  are  described,  for 

Multiplication  of  long  integers  can  be  achieved  with  reasonable  ^ciency  using  the 
convolution  form  of  the  product  working  with  the  component  words  of  the  large  integers. 
A-ain  for  simplicitv.  we  shall  only  consider  multiplication  of  positive  integers.  The  mul¬ 
tiplication  of  two  integers  in  the  form  (9)  will  result  in  an  integer  whose  representation 

requires  at  most  2K  L-bit  words. 

Suppose  that  we  require  the  product  of  the  long  integeis 


777 


T=0 


K-1 


n  =  53  l3iR' 


This  product  is  given  by 


t=0 


777  » 


(10) 


where  we  use  the  convention  =  0  if  j  <  0.  This  is  essentially  a  convolution  product  of 

^'’^No^'.^cr coefficient  product  a,3,-i  can  be  viewed  as  a  ^e-i?^ 

write  in  the  form  dd.-i  =  where  each  and  ^  is  a  ^ 

is  the  most  significant  and  5a-,.  the  least  significant  R-digit  of  Qi3k-i-  The  pioduct 

(10)  can  therefore  be  written  as 

.  „  =  'g'  &„)  =  E  (E  -  ‘m)  «■  (») 

t^o  V.=0  /  fc=o  V'=o  ^ 

where  ')A,i  =  =  0  for  i  >  k. 

Remark  2.  Carries  beyond  the  ^  position  are  not  possible  since  m*n  <  (R^ -if  < 

R-^'. 
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Remark  3.  The  results  of  the  iaaer  sums  in  (11)  '■rill  usually  create  further  caries  which 
must  be  accounted  for.  However,  we  note  that  for  almost  any  minimal  degree  of  parallelism 
in  the  processor,  each  of  these  inner  sums  can  be  performed  simultaneously  since  they 
consist  of  at  most  2K  terms  each  less  than  R  and  we  would  expect  2K  «  R.  Therefore  the 
sum  will  be  less  than  R?  so  that  it  mil  be  representable  in  a  double  length  accumulator. 


For  the  8-bit  basic  wordlength  the  restriction  is  only  that  our  integers  do  not  exceed 
^28^256  _  22048  >  which  is  well  beyond  any  tjTJical  integer  computing  range  for 

linear  algebra  problems.  .  .  r  -  j  *i  r  » 

The  length  of  the  inner  sums  in  (11)  will  also  restrict  the  size  of  any  carry  and  therefore 

mav  be  useful  in  bounding  the  range  of  the  propagation  of  such  carries.  This  wuld  be 
us^  to  improve  the  efficiency  of  such  a  convolution  product.  We  do  not  consider  such 

details  further  in  this  report.  ,  •*  r 

What  effect  does  such  a  long  multiplication  have  on  the  arithmetic  complexity  ol 

an  integer  algorithm?  The  product  formula  (10)  entails  basic  multiplications.  The^ 
components  must  then  be  broken  into  their  two  component  digits  and  then  approximately 
additions  together  with  the  carries  these  generate.  In  Table  4,  this  means  that  each 
of  the  (approximately)  |n®  multiplications  entails  something  like  2K  regular 
arithmetic  operations.  For  even  quite  moderate  range  growth  with  K  =  4  this  has  the 
effect  of  increasing  this  part  of  the  complexity,  by  a  factor  of  32. 

We  conclude  this  section  with  a  brief  comment  on  the  impact  of  parallelism  on  per¬ 
forming  Gauss  elimination  using  integer  computation  and  on  the  long  integer  arithmetic 
required  for  the  growth  in  the  dynamic  range.  An  array  processor  could  be  used  to  ac¬ 
celerate  this  algorithm  greatly.  Firstly,  all  the  coefficient  products  in  the  inner  sum  in 
(10)  can  be  computed  simultaneously.  The  realignment  of  the  upper  and  lower  halves  of 
these  products  needed  for  the  inner  summation  in  (11)  is  then  a  simple  shift  of  data  to  a 
neighboring  processor  and  these  sums  could  then  also  be  performed  simultaneously.  The 
final  carries  would  be  the  only  part  that  requires  serial  processing. 

Division  of  long  integers  can  also  be  achieved  by  generalizing  some  of  the  standard 
algorithms  which  are  used  in  binary  integer  hardware  such  as  the  SRT  division  algorithm 
which  is  based  on  the  idea  of  nonrestoring  division.  This  algorithm  is  well-suited  to 
high-radbc  division  and  so  can  be  modified  to  the  long  integer  framework.  The  SRT 
algorithm  relies  on  repeated  addition  and  subtraction  using  signed  digits.  Since  we  haw 
onlv  dealt  with  arithmetic  of  nonnegative  long  integers  here,  we  do  not  discuss  the  detailed 
implementation  further.  For  details  of  the  basic  algorithm  and  its  implementation  at  least 
for  radix  4,  see  [12]. 


5.2.  Computing  with  the  rationals.  In  this  section, we  consider  the  use  of  Gauss 
elimination  in  the  setting  of  rational  arithmetic.  Of  course  in  some  sense,  floating-point  ^ 
rational  arithmetic  but  it  uses  a  very  special  subset  of  the  rationals  and  does  not  comply 
with  the  axioms  of  com  entional  rational  arithmetic.  We  are  concerned  here  with  matrices 
with  entries  in  the  field  Q  of  rational  numbers.  The  arithmetic  operations  will  be  similar 
in  nature  to  those  of  Algorithm  15  for  back  substitution  in  the  integer  GE  solution  of  a 
linear  svstem. 

There  are  choices  to  be  made  over  the  way  in  which  rational  numbers  are  to  be  stored 
and  manipulated  within  the  computer.  The  conceptually  simplest  option  is  simply  to  store 
g  €  Q  as  a  pair  of  integers  representing  its  numerator  and  (positive)  denominator  in  its 
maximally  reduced  form.  Alternatives  that  liave  been  extensively  researched  include  the 
use  of  continued  fraction  representations  and,  in  particular,  the  lexicographic  continued 
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fraction,  or  LCF,  arithmetic  of  Kornerup  and  Matula  [8).  We  shall  only  consider  here 
the  [n/m]  form  in  which  a  rational  number  is  represented  as  a  quotient  of  two  co-prime 

integers 


where  n  m  €  7,  have  no  common  factors  and  m  >  0.  Arithmetic  operations  are  then 
defined  according  to  the  usual  rules  of  rational  arithmetic  with  reduction  to  this  “normal¬ 
ized”  form  after  each  arithmetic  operation.  ,  .  ,  , 

It  is  immediately  apparent  that  many  of  the  range  growth  problems  which  plague 
integer  GE  will  reappear  here  with  comparable  severity.  Of  course  the  divisions  which  are 
inherent  in  the  basic  forms  of  GE  and  LU  factorization  can  be  performed  here  and  r^trict 
the  rano-e  of  %-alues  of  the  rational  numbers  being  represented  -  but  these  divisions  do  not 
necessarily  reduce  the  range  of  integers  needed  for  the  numerators  and  denominators 
separately.  The  chief  virtue  that  would  be  derived  here  is  the  elimination  of  any  rounding 
errors  and  therefore  definitive  answers  to  questions  such  as  the  rank  of  the  matrix  and 
exact  values  for  the  determinant  and  for  the  solution  vector  of  a  linear  system. 

The  algorithm  for  performing  Gauss  elimination  with  a  rational  matrix  can  be  any 
variant  of  Algorithm  1  with  real  arithmetic  operations  replaced  by  rational  arithmetic. 
There  is  no  benefit  in  detailing  this  algorithm  explicitly.  A  more  detailed  discussion  in 
terms  of  both  complexity  and  the  integer  range  growth  for  rational  Gauss  elimination  is 
included  in  [17].  In  this  section,  we  content  ourselves  with  a  summary  of  some  relevant 
results. 

Complexity.  The  most  obvious  increase  in  complexity  arises  out  of  the  mere  fact 
that  it  is  performing  rational  arithmetic  and  so  every  arithmetic  operation  entails  manii> 
ulation  of  both  the  numerator  and  denominator.  There  is  also  the  further  complic^ion 
arisino-  from  the  reduction  of  each  ordered  pair  to  represent  an  irreducible  fraction.  This 
require  either  that  each  integer  is  stored  as  a  product  of  its  prime  factors,  or  more  rea¬ 
sonably,  that  the  Euclidean  algorithm  for  finding  the  gcd  of  two  integers  is  employed  and 
followed  with  two  integer  divisions.  Because  the  Euclidean  algorithm  is  iterative  we  can¬ 
not  determine  the  number  of  integer  mod  operations  that  are  required.  (Also  any  bounds 
which  could  be  derived  would  be  hopelessly  pessimistic.) 

In  Table  5.  we  list  the  numbers  of  conventional  integer  arithmetic  operations  together 
with  the  number  of  gcd's  that  are  needed.  Typically  we  might  expect  that  a  gcd  would 
be  equivalent  to  several  integer  divisions  and  that  these  divisions  are.  in  turn,  equivalent 
to  several  multiplications.  However  it  should  also  be  noted  that,  because  of  the  range 
crrowth,  the  multiplications  may  involve  long  wordlength  integers.  On  the  other  hand 
we  may  expect  the  gcd  to  be  much  smaller  so  that  the  divisor  wordlengths  may  be  less 
extreme.  In  the  operation  counts  in  Table  5,  no  attempt  is  made  to  account  for  dynamic 
range  considerations  or  the  relative  weights  to  be  gii-en  to  the  different  operations. 


TABLE  5  Integer  arithmetic  operation  counts  for  the  rational  GE  for  solution  of  an 
n  X  n  linear  system  Ax  =  b. 


Operation 

Matrix  Factorization 

Right-hand  side 

TOTAL 

+  or  “ 

- 1) 

2n^  (n  -  1) 

ln(n-  1) 

I??  (??  1)  (271  +  5) 

X 

371  (77  —  1) 

n  {n  “  l)  (277  +  3) 

/ 

gcd 

h 

In 

7?  (7?  —  1) 

^n(n  —  1) 

hi  (7i  -  1)  (277  +  5) 
1??  (77  —  1)  (277  +  5) 
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Again  the  biggest  single  effect  is  that  the  number  of  divisions  has  increa^  to  0(n3) 
in  addition  to  the  O  (n^)  gcd  operations.  This  time  the  number  of  multiplications  has 
also  increased  by  a  substantial  factor. 

Growth  in  dynamic  range.  In  attempting  to  analyse  the  potential  growth  in  the 
necessary  dynamic  range  for  rational  GE,  we  cannot  assume  any  useful  reduction  in  the 
rationalquantities  other  than  that  which  is  a  necessary  consequence  of  the  elimination 
procedure  itself.  For  simplicity  in  this  setting,  we  make  the  assumption  that  the  onpnal 
matrix  A  is  in  fact  an  integer  matrix.  This  allows  us  to  compare  the  results  of  the  rational 
akorithm  with  those  that  would  be  obtained  using  the  divisionless  integer  algorithm.  In 
much  the  same  wav  as  was  described  for  the  fraction-free  Algorithm  6,  this  comparison 
reveals  certain  common  factors  which  will  be  removed  by  the  various  reduction  steps. 
This  has  the  effect  of  reducing  the  potential  range  growth  in  a  similar  manner  to  that 
which  we  observed  in  Algorithm  6.  Indeed  the  dynamic  range  required  for  the  rational 
algorithm  turns  out  to  be  identical  to  that  for  Algorithm  6.  Therefore  if  the  initial  matrix 
is  an  integer  matrix,  using  rational  arithmetic  yields  no  benefit  relative  to  the  fraction-free 
algorithm.  If  the  original  matrix  consists  of  rational  entries,  the  range  grow-th  problem 
becomes  potentially  even  more  critical  since,  the  “cross- multiplications  ’  needed  for  the 
elimination  do  not  appear  to  generate  any  obvious  and  general  common  factors. 

5.3.  General  rings.  Finally,  we  consider  briefly  the  applicability  (and  application)  of 
GE  in  a  more  general  algebraic  setting.  We  are  interested  here  in  matric^  with  entries  in 
a  general  ring  Tl.  There  are  at  least  two  fundamental  questions  which  arise  immediately; 

“When  do  the  problems  have  solutions?”  and 

“When  do  the  algorithms  make  sense?” 

If  72.  is  a  unique  factorization  domain  (EFD)  then  the  algorithms  described  earlier  for 
integer  computation  remain  %-alid  while  any  rational  arithmetic  algorithms  have  obvious 
analogues  in  the  fidd  of  fractions  Q.  For  details  of  the  definitions  and  properties  of  the 
various  algebraic  structures  see  [4]  or  any  standard  text  on  abstract  algebra.  In  particular 
this  means  that  the  above  algorithms  have  direct  analogues  for  rings  of  polynomials  over 
R  and  C  or  their  fields  of  fractions  which  consist  of  rational  functions  over  these  fields. 
Some  of  the  specific  problems  have  solutions  in  slightly  more  general  settings  than  just  a 
UFD.  However  the  fraction-free  algorithms  only  make  sense  in  a  setting  where  removal  of 
common  factors  can  be  achieved.  A  unique  factorization  domain  is  an  appropriate  setting 

for  this.  .  ,  ,  . 

It  is  then  the  case  that  all  the  algorithms  described  earlier  for  the  integer  carry  over 

in  the  natural  way  to  such  a  ring:  all  “arithmetic”  being  replaced  by  its  corresponibng 
ring  operations  with  division  being  understood  to  mean  removal  of  common  factoia.  The 
resulting  algorithms  would  therefore  be  much  like  the  integer  algorithms  in  a  setting  where 
integers  are  represented  by  a  list  of  their  prime  factors.  Removal  of  common  factors  would 
then  literallv  mean  their  removal  from  tlie  corresjionding  lists. 

The  most  interesting  settings  for  these  more  general  algorithms  would  be  the  poly¬ 
nomial  rings  R[.t]  and  C  [.r]  for  which  more  complicated  fraction-free  GE  algorithms  are 
built  into  Computer  .41gebra  Systems  such  as  Maple.  The  similarity  of  the  algorithms  to 
those  for  the  integers  are  so  great  that  they  are  not  detailed  separately  heie. 

6.  Conclusions 

In  this  paper  we  have  presented  a  simplified  form  of  Gauss  elimination  for  fraction-free 
integer  computation.  This  algorithm  has  several  important  advantages  over  conventional 
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approaches: 

It  requires  no  searching  for  common  factors  —  The  factors  which  are  removed  are 
entirely  predictable  in  the  light  of  comparison  between  integer  GE  and  its  standard  real 

arithmetic  counterpart.  ,  r  , 

The  dynamic  range  growth  problem  is  substantially  alleviated  by  the  removal  of  these 

known  factors.  •  r  u 

The  diagonal  of  the  final  upper  triangular  matrix  consists  of  the  determinants  of  the 

principal  minors  of  the  original  matrix.  This  has  the  benefit  of  making  positive  definiteness 
(as  well  as  the  determinant)  easily  identifiable. 

The  range  growth  that  is  needed  is  no  worse  than  that  which  is  inherent  in  the  original 
matrix  in  the  sense  that  determinants  of  all  its  principal  minors  will  inevitably  need  to 
be  representable. 

The  algorithm  generalizes  in  a  completely  obvious  way  to  more  general  Unique  Fac¬ 
torization  Domains  —  including  rings  of  real  or  complex  polynomials  for  example. 
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